Using Pyndamics to Perform Bayesian Parameter Estimation in Dynamical Systems

Pyndamics provides a way to describe a dynamical system in terms of the differential equations, or the stock-flow formalism. It is a wrapper around the Scipy odeint function, with further functionality for time plots, phase plots, and vector fields. The MCMC component of this package uses pymc (version 2 right now): http://pymc-devs.github.io/pymc/.

Page for this package: https://code.google.com/p/pyndamics/


In [1]:
from pyndamics import Simulation
from pyndamics.mcmc import MCMCModel

Artificial Example with Mice Population


In [15]:
data_t=[0,1,2,3]
data_mouse=[2,5,7,19]

sim=Simulation()                    # get a simulation object

sim.add("mice'=b*mice - d*mice",    # the equations
    2,                            # initial value
    plot=True)                      # display a plot, which is the default

sim.add_data(t=data_t,mice=data_mouse,plot=True)
sim.params(b=1.1,d=0.08)            # specify the parameters
sim.run(5)


<matplotlib.figure.Figure at 0x109c925d0>

In [16]:
model=MCMCModel(sim,{'b':[0,10]})
model.fit(iter=25000)


 [-----------------100%-----------------] 25000 of 25000 complete in 29.2 sec
Time taken: 29.45 s

In [17]:
model.b


Out[17]:
0.8212901262612116

In [18]:
sim.run(5)


<matplotlib.figure.Figure at 0x109bb2ad0>

In [19]:
model.plot_distributions()



In [24]:
model.sigma


Out[24]:
{'b': 0.03545979922106901, 'std_dev': 2.1945403677485853}

In [25]:
print "The parameter b has best estimate",model.mu['b'],' +/- ',model.sigma['b']


The parameter b has best estimate 0.818476636275  +/-  0.0354597992211

A linear growth example

Data from http://www.seattlecentral.edu/qelp/sets/009/009.html

Plot the data


In [15]:
t=array([7,14,21,28,35,42,49,56,63,70,77,84],float)
h=array([17.93,36.36,67.76,98.10,131,169.5,205.5,228.3,247.1,250.5,253.8,254.5])

plot(t,h,'-o')
xlabel('Days')
ylabel('Height [cm]')


Out[15]:
<matplotlib.text.Text at 0x109eab990>

Run an initial simulation

Here, the constant value ($a=1$) is hand-picked, and doesn't fit the data particularly well.


In [16]:
sim=Simulation()
sim.add("h'=a",1,plot=True)
sim.add_data(t=t,h=h,plot=True)
sim.params(a=1)
sim.run(0,90)


<matplotlib.figure.Figure at 0x10a592e10>

Fit the model parameter, $a$

Specifying the prior probability distribution for $a$ as uniform between -10 and 10.


In [17]:
model=MCMCModel(sim,{'a':[-10,10]})
model.fit(iter=25000)


 [-----------------100%-----------------] 25000 of 25000 complete in 11.5 sec
Time taken: 11.59 s

What is the best fit parameter value?


In [18]:
model.a


Out[18]:
3.5428411581168153

Rerun the model


In [19]:
sim.run(0,90)


<matplotlib.figure.Figure at 0x1063e0b90>

Plot the posterior histogram


In [20]:
model.plot_distributions()


Plot the posterior histogram with the normal approximation


In [21]:
model.plot_distributions(show_normal=True)


Fit the model parameter, $a$, and the initial value of the variable, $h$


In [23]:
model=MCMCModel(sim,{'a':[-10,10],'initial_h':[0,18]})
model.fit(iter=25000)


 [-----------------100%-----------------] 25000 of 25000 complete in 18.5 sec
Time taken: 18.68 s

In [25]:
sim.run(0,90)
model.plot_distributions(show_normal=True)


<matplotlib.figure.Figure at 0x109e723d0>

Plot the simulations for many samplings of the simulation parameters


In [26]:
sim.noplots=True  # turn off the simulation plots
for i in range(500):
    model.draw()
    sim.run(0,90)
    plot(sim.t,sim.h,'g-',alpha=.1)
sim.noplots=False  # gotta love a double-negative
plot(t,h,'bo')  # plot the data


Out[26]:
[<matplotlib.lines.Line2D at 0x109e71a50>]

Logistic Model with the Same Data


In [33]:
t=array([7,14,21,28,35,42,49,56,63,70,77,84],float)
h=array([17.93,36.36,67.76,98.10,131,169.5,205.5,228.3,247.1,250.5,253.8,254.5])

sim=Simulation()
sim.add("h'=a*h*(1-h/K)",1,plot=True)
sim.add_data(t=t,h=h,plot=True)
sim.params(a=1,K=500)
sim.run(0,90)

# fig=sim.figures[0]
# fig.savefig('sunflower_logistic1.pdf')
# fig.savefig('sunflower_logistic1.png')


<matplotlib.figure.Figure at 0x10b4f3dd0>

Fit the model parameters, $a$ and $K$, and the initial value of the variable, $h$


In [64]:
model=MCMCModel(sim,{'a':[0.001,5],'K':[0.1,500],'initial_h':[0,20]})
model.fit(iter=25000)
print "parameter a has the best-fit value",sim.a
print "parameter K has the best-fit value",sim.K
print "initial value of the variable h has the best-fit value",sim.initial_h


 [-----------------100%-----------------] 25000 of 25000 complete in 188.5 sec
Time taken: 3 m, 10.61 s
parameter a has the best-fit value 0.0875844539644
parameter K has the best-fit value 261.125897266

Plot the Results


In [67]:
sim.run(0,90)

fig=sim.figures[0]
fig.savefig('sunflower_logistic2.pdf')
fig.savefig('sunflower_logistic2.png')

model.plot_distributions(show_normal=True,separate_figures=False,figsize=(20,20))

# fig=gcf()
# fig.savefig('sunflower_posteriors.pdf')
# fig.savefig('sunflower_posteriors.png')


<matplotlib.figure.Figure at 0x1157a4150>

Plot the joint distribution between parameters, $a$ and $K$


In [36]:
model.plot_joint_distribution('a','K',show_prior=False)
# fig=gcf()
# fig.savefig('sunflower_joint.pdf')
# fig.savefig('sunflower_joint.png')


...showing the joint, with the prior, to see how much the inference constrains the final best estimates of the parameters


In [37]:
model.plot_joint_distribution('a','K',show_prior=True)


Plot the 95% credible interval for the dynamics of the system


In [39]:
plot(t,model['h_data'].value,'o')
plot(t,model['h'].stats()['mean'],'--')
plot(t,model['h'].stats()['95% HPD interval'], 'r:')
xlabel('Days')
ylabel('Height [cm]')

fig=gcf()
fig.savefig('sunflower_CI.pdf')
fig.savefig('sunflower__CI.png')